Simultaneous Discrimination Prevention and Privacy Protection in Data Publishing and Mining

نویسنده

  • Sara Hajian
چکیده

Data mining is an increasingly important technology for extracting useful knowledge hidden in large collections of data. There are, however, negative social perceptions about data mining, among which potential privacy violation and potential discrimination. The former is an unintentional or deliberate disclosure of a user profile or activity data as part of the output of a data mining algorithm or as a result of data sharing. For this reason, privacy preserving data mining has been introduced to trade off the utility of the resulting data/models for protecting individual privacy. The latter consists of treating people unfairly on the basis of their belonging to a specific group. Automated data collection and data mining techniques such as classification have paved the way to making automated decisions, like loan granting/denial, insurance premium computation, etc. If the training datasets are biased in what regards discriminatory attributes like gender, race, religion, etc., discriminatory decisions may ensue. For this reason, anti-discrimination techniques including discrimination discovery and prevention have been introduced in data mining. Discrimination can be either direct or indirect. Direct discrimination occurs when decisions are made based on discriminatory attributes. Indirect discrimination occurs when decisions are made based on non-discriminatory attributes which are strongly correlated with biased discriminatory ones. In the first part of this thesis, we tackle discrimination prevention in data mining and propose new techniques applicable for direct or indirect discrimination prevention individually or both at the same time. We discuss how to clean training datasets and outsourced datasets in such a way that direct and/or indirect discriminatory decision rules are converted to legitimate (non-discriminatory) classification rules. The experimental evaluations

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ارایه یک روش جدید انتشار داده‌ها با حفظ محرمانگی با هدف بهبود دقّت طبقه‌‌بندی روی داده‌های گمنام

Data collection and storage has been facilitated by the growth in electronic services, and has led to recording vast amounts of personal information in public and private organizations databases. These records often include sensitive personal information (such as income and diseases) and must be covered from others access. But in some cases, mining the data and extraction of knowledge from thes...

متن کامل

Effective Incentive Compatible Model for Privacy Preservation of Information in Secure Data Sharing and Publishing

Privacy preserving is one of the most important research topics in the data security field and it has become a serious concern in the secure transformation of personal data in recent years. For example, different credit card companies and disease control centers may try to build better data sharing or publishing models for privacy protection through privacy preserving data mining techniques (PP...

متن کامل

Discrimination Discovery and Prevention in Data Mining: A Survey

Data Mining is the computation process of discovering knowledge or patterns in large data sets. But extract knowledge without violation such as privacy and non-discrimination is most difficult and challenging. This is mainly because of data mining techniques such as classification rules are actually learned by the system from the training data and training data sets itself are biased in what re...

متن کامل

TrPLS: Preserving Privacy in Trajectory Data Publishing by Personalized Local Suppression

Trajectory data are becoming more popular due to the rapid development of mobile devices and the widespread use of location-based services. They often provide useful information that can be used for data mining tasks. However, a trajectory database may contain sensitive attributes, such as disease, job, and salary, which are associated with trajectory data. Hence, improper publishing of the tra...

متن کامل

Direct and Indirect Discrimination Prevention Methods

Along with privacy, discrimination is a very important issue when considering the legal and ethical aspects of data mining. It is more than obvious that most people do not want to be discriminated because of their gender, religion, nationality, age and so on, especially when those attributes are used for making decisions about them like giving them a job, loan, insurance, etc. Discovering such ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1306.6805  شماره 

صفحات  -

تاریخ انتشار 2013